NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

The Piranha Problem: Large Effects Swimming in a Small Pond

https://doi.org/10.1090/noti3044

Tosh, Christopher; Greengard, Philip; Goodrich, Ben; Gelman, Andrew; Vehtari, Aki; Hsu, Daniel (January 2025, Notices of the American Mathematical Society)

Full Text Available
Bayesian cross-validation by parallel Markov chain Monte Carlo

https://doi.org/10.1007/s11222-024-10404-w

Cooper, Alex; Vehtari, Aki; Forbes, Catherine; Simpson, Dan; Kennedy, Lauren (August 2024, Statistics and Computing)

Brute force cross-validation (CV) is a method for predictive assessment and model selection that is general and applicable to a wide range of Bayesian models. Naive or ‘brute force’ CV approaches are often too computationally costly for interactive modeling workflows, especially when inference relies on Markov chain Monte Carlo (MCMC). We propose overcoming this limitation using massively parallel MCMC. Using accelerator hardware such as graphics processor units, our approach can be about as fast (in wall clock time) as a single full-data model fit. Parallel CV is flexible because it can easily exploit a wide range data partitioning schemes, such as those designed for non-exchangeable data. It can also accommodate a range of scoring rules. We propose MCMC diagnostics, including a summary of MCMC mixing based on the popular potential scale reduction factor (R-hat) and MCMC effective sample size (ESS) measures. We also describe a method for determining whether an R-hat diagnostic indicates approximate stationarity of the chains, that may be of more general interest for applications beyond parallel CV. Finally, we show that parallel CV and its diagnostics can be implemented with online algorithms, allowing parallel CV to scale up to very large blocking designs on memory-constrained computing accelerators.
more » « less
Full Text Available
Cross-Validatory Model Selection for Bayesian Autoregressions with Exogenous Regressors

https://doi.org/10.1214/23-BA1409

Cooper, Alex; Simpson, Dan; Kennedy, Lauren; Forbes, Catherine; Vehtari, Aki (January 2024, Bayesian Analysis)
Steel, Mark (Ed.)
Bayesian cross-validation (CV) is a popular method for predictive model assessment that is simple to implement and broadly applicable. A wide range of CV schemes is available for time series applications, including generic leave-one-out (LOO) and K-fold methods, as well as specialized approaches intended to deal with serial dependence such as leave-future-out (LFO), h-block, and hv-block. Existing large-sample results show that both specialized and generic methods are applicable to models of serially-dependent data. However, large sample consistency results overlook the impact of sampling variability on accuracy in finite samples. Moreover, the accuracy of a CV scheme depends on many aspects of the procedure. We show that poor design choices can lead to elevated rates of adverse selection. In this paper, we consider the problem of identifying the regression component of an important class of models of data with serial dependence, autoregressions of order p with q exogenous regressors (ARX(p,q)), under the logarithmic scoring rule. We show that when serial dependence is present, scores computed using the joint (multivariate) density have lower variance and better model selection accuracy than the popular pointwise estimator. In addition, we present a detailed case study of the special case of ARX models with fixed autoregressive structure and variance. For this class, we derive the finite-sample distribution of the CV estimators and the model selection statistic. We conclude with recommendations for practitioners.
more » « less
Full Text Available
Fast Methods for Posterior Inference of Two-Group Normal-Normal Models

https://doi.org/10.1214/22-BA1329

Greengard, Philip; Hoskins, Jeremy; Margossian, Charles C; Gabry, Jonah; Gelman, Andrew; Vehtari, Aki (September 2023, Bayesian Analysis)

Full Text Available
R-squared for Bayesian Regression Models

https://doi.org/10.1080/00031305.2018.1549100

Gelman, Andrew; Goodrich, Ben; Gabry, Jonah; Vehtari, Aki (December 2018, The American Statistician)

Full Text Available
Visualization in Bayesian Workflow

https://doi.org/10.1111/rssa.12378

Gabry, Jonah; Simpson, Daniel; Vehtari, Aki; Betancourt, Michael; Gelman, Andrew (January 2019, Journal of the Royal Statistical Society Series A: Statistics in Society)

Abstract Bayesian data analysis is about more than just computing a posterior distribution, and Bayesian visualization is about more than trace plots of Markov chains. Practical Bayesian data analysis, like all data analysis, is an iterative process of model building, inference, model checking and evaluation, and model expansion. Visualization is helpful in each of these stages of the Bayesian workflow and it is indispensable when drawing inferences from the types of modern, high dimensional models that are used by applied researchers.
more » « less
Using Stacking to Average Bayesian Predictive Distributions (with Discussion)

https://doi.org/10.1214/17-BA1091

Yao, Yuling; Vehtari, Aki; Simpson, Daniel; Gelman, Andrew (September 2018, Bayesian Analysis)

Full Text Available
Bayesian aggregation of average data: An application in drug development

https://doi.org/10.1214/17-AOAS1122

Weber, Sebastian; Gelman, Andrew; Lee, Daniel; Betancourt, Michael; Vehtari, Aki; Racine-Poon, Amy (September 2018, The Annals of Applied Statistics)

Full Text Available

Search for: All records